Finding gene modules that correlate with neurite growth I

In this script I perform some exploratory analysis of the dataset using the Seurat r-package. In the first part I follow exactly along the lines of this tutorial: https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html Given all steps are discussed in the tutorial, I do not discuss them here. In the last part, I check whether there is any evidence of gene modules that are correlated with neurite growth, using a simple principal component analysis. I conclude that in a next step, we should try more advanced methods for gene module detection and that we may also need more data to discover meaningful connection to neurite growth.

Load the required R libraries:

library(readxl)
library(dplyr)
library(Seurat)
library(myUtils)
library(EnsDb.Mmusculus.v75)
library(lubridate)
library(ComplexHeatmap)

Here I 1.) load the data 2.) change the gene identifiers from ensemble ids to symbols, where this is possible (otherwise I leave the ensembl id in place) 3.) load the metadata 4.) I put the data and metadata into a Seurat object. 5.) I also load mitochondrial mouse genes from a database for later use.

directory = '/home/jovyan/axonGrowth/'
setwd(directory)
axonGrowth_counts = read.csv('counts.csv', row.names = 1)
rownames(axonGrowth_counts) = unlist(lapply(1:length(rownames(axonGrowth_counts)), function(x) strsplit(rownames(axonGrowth_counts)[x], split = '\\.')[[1]][1]))
symbols = mapIdsMouse(rownames(axonGrowth_counts), 'ENSEMBL', 'SYMBOL')
mitogenes <- genes(EnsDb.Mmusculus.v75, filter = ~ seq_name == "MT")$gene_id
percent.mt = colSums(axonGrowth_counts[rownames(axonGrowth_counts) %in% mitogenes,])/colSums(axonGrowth_counts)
rownames(axonGrowth_counts) = symbols
#sum(unlist(lapply(1:dim(axonGrowth_counts)[1], function(x) substring(rownames(axonGrowth_counts)[x], 1,3))) != 'ENS')/dim(axonGrowth_counts)[1]
metadata = read_xlsx('Summary-neurons_microcoverslips_RNAseq.xlsx', col_types = c('text', 'text', 'date', 'date', 'date', 'date', 'date', rep('text', 7)), skip = 1)
metadata = metadata[10:105,]
axonGrowth = CreateSeuratObject(axonGrowth_counts, project = 'axonGrowth', min.cells = 0, min.features = 0)
axonGrowth$TotalNeuriteLength = as.numeric(metadata$`Total Neurite Length (um)`)
axonGrowth$LongestNeuriteLength = as.numeric(metadata$`Longest Neurite Length (um)`)
axonGrowth$TimeSincePlating = unlist(lapply(1:dim(metadata)[1], function(x) interval(ymd_hms(metadata$`Time of plating (mm/dd/yyyy hh:mm)`[x]), ymd_hms(metadata$`Cell collection time (mm/dd/yyyy hh:mm)`[x]))/dhours(1)))
axonGrowth$TimeSincePlating = unlist(lapply(1:dim(metadata)[1], function(x) interval(ymd_hms(metadata$`Time of plating (mm/dd/yyyy hh:mm)`[x]), ymd_hms(metadata$`Cell collection time (mm/dd/yyyy hh:mm)`[x]))/dhours(1)))
axonGrowth$TimeInAIRMEM = unlist(lapply(1:dim(metadata)[1], function(x) interval(ymd_hms(metadata$`AIR-MEM change (mm/dd/yyyy hh:mm)`[x]), ymd_hms(metadata$`Cell collection time (mm/dd/yyyy hh:mm)`[x]))/dhours(1)))

The QC plots using number of detected genes, number of counts and percent of counts coming from mitochondrial genes (as a proxy for stress), show a couple of outlier cells, which I remove in the last line:

axonGrowth[["percent.mt"]] <- percent.mt
VlnPlot(axonGrowth, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)

plot1 <- FeatureScatter(axonGrowth, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(axonGrowth, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
CombinePlots(plots = list(plot1, plot2))

axonGrowth <- subset(axonGrowth, subset = nFeature_RNA > 10000 & nFeature_RNA < 20000 & percent.mt < 2.5 & nCount_RNA > 10^5 & nCount_RNA < 10^6)
VlnPlot(axonGrowth, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)

The following normalizes, scales and select 1000 particularly variable genes. Doing the downstream analysis with only the 1000 most variable genes seems like a waste of information. However, given we only have 85 high quality cells available for this analysis at the moment, it will give us the most robust clustering results.

axonGrowth <- NormalizeData(axonGrowth, normalization.method = "LogNormalize", scale.factor = 10000)
axonGrowth <- FindVariableFeatures(axonGrowth, selection.method = "vst", nfeatures = 1000)
top25 <- head(VariableFeatures(axonGrowth), 25)
plot1 <- VariableFeaturePlot(axonGrowth)
plot2 <- LabelPoints(plot = plot1, points = top25, repel = TRUE)
CombinePlots(plots = list(plot1, plot2))

all.genes <- rownames(axonGrowth)
axonGrowth <- ScaleData(axonGrowth, features = all.genes)

These are the results of the PCA analysis:

axonGrowth <- RunPCA(axonGrowth, features = VariableFeatures(object = axonGrowth))
## PC_ 1 
## Positive:  ENSMUSG00000105285, ENSMUSG00000111824, ENSMUSG00000092805, Tmem170, ENSMUSG00000106451, Cct8l1, ENSMUSG00000111551, ENSMUSG00000109438, Trp53i11, S1pr3 
##     ENSMUSG00000112012, ENSMUSG00000112018, Col1a2, Olfr92, ENSMUSG00000108105, ENSMUSG00000102467, ENSMUSG00000104498, 1700066M21Rik, Mettl21c, ENSMUSG00000113067 
##     ENSMUSG00000106915, Glt8d2, Lpar1, Lhpp, Olig1, Tas2r103, Kcne4, ENSMUSG00000097959, ENSMUSG00000113451, ENSMUSG00000113545 
## Negative:  Ppia, Rtn3, Hnrnpk, Calm1, ENSMUSG00000044285, Ddx5, Eef1a1, Hspa8, Ubb, Mif 
##     Actb, H3f3b, Prkar1a, ENSMUSG00000062933, Cfl1, Eif4a1, Gnb1, Ap2m1, Hsp90ab1, Eif1 
##     Rpl4, Ywhaq, Ywhae, Eif4g2, Cnn3, Calm2, Eif5a, Atp5b, Atp6v0c, Arf1 
## PC_ 2 
## Positive:  Cyba, Ctss, Lyz2, Tyrobp, Hexb, Lgals3, Fcer1g, Cd68, Laptm5, Spp1 
##     Capg, Lgmn, Gpnmb, Anxa1, Trem2, Creg1, Lipa, B2m, Ctsb, Clec4d 
##     Arpc1b, Anxa5, Cd53, Igf1, Ctsz, Cybb, Colec12, Ctsl, Ctsd, Mpeg1 
## Negative:  Stmn4, Mapt, Nrep, Map1b, Stmn2, Gap43, Acat2, Fdps, Stmn3, Scg5 
##     Crmp1, Snrpn, Sqle, Nsg1, Dpysl3, Syt4, Kif5c, Stmn1, Rtn1, Tubb2b 
##     Nnat, Gng2, Dpysl5, Sox11, Fasn, Tuba1a, Klc1, Clip3, Fdft1, Trim2 
## PC_ 3 
## Positive:  Cldn3, Rpl17, Smurf2, Prxl2a, Zfp334, Cbx5, Peli1, Ttc3, Id3, Bet1 
##     ENSMUSG00000096972, Bdh1, H2afy2, Clu, Igfbpl1, Enc1, Cdh2, Adnp, Nfix, Mturn 
##     Lrrc57, Otud6b, Epha4, Serpinf1, Rmnd5b, Spryd4, Zbtb20, Pmvk, Ppp1r16a, Vmp1 
## Negative:  Syt3, Lao1, Tspan12, Ado, Pcdhb16, Ppp1r37, Sema5b, Maneal, Dus4l, ENSMUSG00000100801 
##     Chac1, Pex5, Dcdc2a, Trib1, Trim32, Spock1, Gpt2, Fam216a, Bhlhb9, ENSMUSG00000103733 
##     Zfp830, Per2, Zfp207, Dner, G6pdx, Sdf2, Reep1, Smad1, Ppif, Fgf13 
## PC_ 4 
## Positive:  Zscan18, Rtf2, Kctd2, Ddit3, Rps6kc1, Tagap1, Cog2, Bet1, Bap1, Elovl4 
##     Hist1h2bc, Nars, Dram2, Oat, Gab1, Sec23a, Snx2, Timm8a1, Slc7a3, Adnp 
##     Atp6v0d1, Ctnnbl1, Gins2, Htra2, Pex19, ENSMUSG00000101878, Tmed3, Rheb, Gad1, Ruvbl2 
## Negative:  Zdhhc8, Acap3, Stard7, Pgm1, Aatk, Gtf2f1, Kat14, Kctd15, Rlbp1, Sema3c 
##     Pcdhga8, Pde1b, Tbc1d19, Mfsd1, Fam3a, ENSMUSG00000108456, Fam210b, 2810408A11Rik, Aldh4a1, Sestd1 
##     Supt16, Ankrd54, Nt5c3, ENSMUSG00000106386, Flot1, Dbn1, Sez6, Mecp2, ENSMUSG00000084858, Chst2 
## PC_ 5 
## Positive:  ENSMUSG00000096972, Slc19a1, Ndufv1, Atg101, Irf3, Nprl2, Hmces, Fcho1, Lrrc57, Pdrg1 
##     Smox, Med21, Gne, Mrps2, Tbc1d10a, Nfkbib, Kcnt1, Pitrm1, Ddx18, Rtl8b 
##     Wdr3, Pepd, Spcs3, Fmod, Slc35a4, Tmem50b, Ppp1r16a, Dnm1l, Spsb3, ENSMUSG00000108693 
## Negative:  Gadd45g, Slc38a2, Plppr1, Gli3, Cdpf1, Eif3l, 1500015O10Rik, Tollip, Dll1, Hspb6 
##     Lzts1, Lgalsl, Axin1, Pim3, Hat1, Cog2, Ppp1r15a, Scamp3, Pcyox1, Fam114a2 
##     Gad1, Gab1, Tagap1, Otud5, Gphn, Igsf21, Id3, Nkiras2, Ptprn, Tmem38a
print(axonGrowth[["pca"]], dims = 1:5, nfeatures = 5)
## PC_ 1 
## Positive:  ENSMUSG00000105285, ENSMUSG00000111824, ENSMUSG00000092805, Tmem170, ENSMUSG00000106451 
## Negative:  Ppia, Rtn3, Hnrnpk, Calm1, ENSMUSG00000044285 
## PC_ 2 
## Positive:  Cyba, Ctss, Lyz2, Tyrobp, Hexb 
## Negative:  Stmn4, Mapt, Nrep, Map1b, Stmn2 
## PC_ 3 
## Positive:  Cldn3, Rpl17, Smurf2, Prxl2a, Zfp334 
## Negative:  Syt3, Lao1, Tspan12, Ado, Pcdhb16 
## PC_ 4 
## Positive:  Zscan18, Rtf2, Kctd2, Ddit3, Rps6kc1 
## Negative:  Zdhhc8, Acap3, Stard7, Pgm1, Aatk 
## PC_ 5 
## Positive:  ENSMUSG00000096972, Slc19a1, Ndufv1, Atg101, Irf3 
## Negative:  Gadd45g, Slc38a2, Plppr1, Gli3, Cdpf1
VizDimLoadings(axonGrowth, dims = 1:2, reduction = "pca")

DimPlot(axonGrowth, reduction = "pca")

DimHeatmap(axonGrowth, dims = 1, cells = round(dim(axonGrowth)[2])/2, balanced = TRUE)

DimHeatmap(axonGrowth, dims = 1:15, cells = round(dim(axonGrowth)[2])/2, balanced = TRUE)

Based on the JackStraw procedure I select 12 PCs for further analysis:

axonGrowth <- JackStraw(axonGrowth, num.replicate = 100)
axonGrowth <- ScoreJackStraw(axonGrowth, dims = 1:20)
JackStrawPlot(axonGrowth, dims = 1:20)

ElbowPlot(axonGrowth, ndims = 40)

This is the clustering step:

axonGrowth <- FindNeighbors(axonGrowth, dims = 1:12)
axonGrowth <- FindClusters(axonGrowth, resolution = 0.5)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 77
## Number of edges: 1843
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.6267
## Number of communities: 2
## Elapsed time: 0 seconds
head(Idents(axonGrowth), 5)
## X8397 X8386 X8417 X8416 X8404 
##     0     1     1     1     0 
## Levels: 0 1

Clusters roughly agree with visual seperation on a UMAP plot:

axonGrowth <- RunUMAP(axonGrowth, dims = 1:12)
DimPlot(axonGrowth, reduction = "umap")

saveRDS(axonGrowth, file = "axonGrowth_SeuratTutorial.rds")

Here we find the top cluster markers. Surprisingly, the second cluster is defined entirely by markers that still have their original ensembl identifiers. This means the database I used could not find HUGO gene symbol matches for these genes. This happens for many genes, however the discrepancy between clusters is suspicious. Checking some of the identifiers such as ENSMUSG00000075015, shows that it is a ‘predicted gene’ named Gm10801 and this is the case for most of the identifiers. So maybe there is a technical reason for this clustering, although I could not quite work out what it is.

cluster1.markers <- FindMarkers(axonGrowth, ident.1 = 1, min.pct = 0.25)
head(cluster1.markers, n = 5)
##                           p_val  avg_logFC pct.1 pct.2    p_val_adj
## Eef1a1             1.107667e-18 -1.6739573 0.971     1 5.382042e-14
## ENSMUSG00000095891 2.284133e-18  0.5707442 1.000     1 1.109837e-13
## Actb               5.289256e-17 -1.9273380 0.971     1 2.569997e-12
## Ppia               1.108689e-16 -1.1357768 1.000     1 5.387009e-12
## Cfl1               1.108689e-16 -1.4979116 1.000     1 5.387009e-12
axonGrowth.markers <- FindAllMarkers(axonGrowth, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
axonGrowth.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
## # A tibble: 4 x 7
## # Groups:   cluster [2]
##      p_val avg_logFC pct.1 pct.2   p_val_adj cluster gene              
##      <dbl>     <dbl> <dbl> <dbl>       <dbl> <fct>   <chr>             
## 1 5.76e- 9     2.12   0.93 0.676 0.000280    0       ENSMUSG00000105361
## 2 9.13e- 6     2.27   1    0.912 0.444       0       Fth1              
## 3 2.36e-12     0.714  1    1     0.000000115 1       ENSMUSG00000095186
## 4 3.29e-12     0.669  1    1     0.000000160 1       ENSMUSG00000075015
VlnPlot(axonGrowth, features = c("Actb", "Tuba1a"))

VlnPlot(axonGrowth, features = c("ENSMUSG00000075015", "ENSMUSG00000095186"), slot = "counts", log = TRUE)

FeaturePlot(axonGrowth, features = c("Actb", "Tuba1a","ENSMUSG00000075015", "ENSMUSG00000095186"))

top25 <- axonGrowth.markers %>% group_by(cluster) %>% top_n(n = 25, wt = avg_logFC)
DoHeatmap(axonGrowth, features = top25$gene, cells = 1:dim(axonGrowth)[2] + NoLegend())

as.matrix(top25)
##       p_val          avg_logFC   pct.1   pct.2   p_val_adj      cluster
##  [1,] "1.107667e-18" "1.6739573" "1.000" "0.971" "5.382042e-14" "0"    
##  [2,] "5.289256e-17" "1.9273380" "1.000" "0.971" "2.569997e-12" "0"    
##  [3,] "1.108689e-16" "1.4979116" "1.000" "1.000" "5.387009e-12" "0"    
##  [4,] "1.892083e-16" "1.6966431" "1.000" "1.000" "9.193443e-12" "0"    
##  [5,] "1.892083e-16" "1.6577344" "1.000" "0.971" "9.193443e-12" "0"    
##  [6,] "2.678079e-16" "1.3756303" "1.000" "1.000" "1.301252e-11" "0"    
##  [7,] "6.768952e-13" "1.4381487" "1.000" "0.971" "3.288966e-08" "0"    
##  [8,] "1.075390e-12" "2.1128930" "1.000" "1.000" "5.225214e-08" "0"    
##  [9,] "1.406107e-12" "1.6307610" "1.000" "0.912" "6.832135e-08" "0"    
## [10,] "2.534517e-12" "1.4860743" "1.000" "0.912" "1.231497e-07" "0"    
## [11,] "6.264909e-12" "1.3604397" "1.000" "1.000" "3.044057e-07" "0"    
## [12,] "9.290134e-11" "1.4588588" "1.000" "0.941" "4.513983e-06" "0"    
## [13,] "1.481118e-10" "1.6672043" "1.000" "1.000" "7.196605e-06" "0"    
## [14,] "5.955054e-10" "1.2884649" "1.000" "0.853" "2.893501e-05" "0"    
## [15,] "1.427421e-09" "1.4876734" "0.977" "0.971" "6.935698e-05" "0"    
## [16,] "3.295693e-09" "1.9697066" "1.000" "0.971" "1.601344e-04" "0"    
## [17,] "3.341609e-09" "1.4054179" "1.000" "0.941" "1.623654e-04" "0"    
## [18,] "5.760648e-09" "2.1231684" "0.930" "0.676" "2.799041e-04" "0"    
## [19,] "6.717473e-09" "1.4194644" "0.953" "0.647" "3.263953e-04" "0"    
## [20,] "6.904425e-07" "1.3124041" "0.930" "0.735" "3.354791e-02" "0"    
## [21,] "6.430492e-06" "1.5277421" "0.907" "0.706" "3.124512e-01" "0"    
## [22,] "9.131769e-06" "2.2692169" "1.000" "0.912" "4.437035e-01" "0"    
## [23,] "2.649126e-05" "1.4861146" "0.884" "0.559" "1.000000e+00" "0"    
## [24,] "9.452719e-05" "1.3102317" "0.837" "0.529" "1.000000e+00" "0"    
## [25,] "1.325866e-03" "1.9809037" "0.930" "0.765" "1.000000e+00" "0"    
## [26,] "2.284133e-18" "0.5707442" "1.000" "1.000" "1.109837e-13" "1"    
## [27,] "2.678079e-16" "0.6112766" "1.000" "1.000" "1.301252e-11" "1"    
## [28,] "8.590836e-16" "0.5053282" "1.000" "1.000" "4.174201e-11" "1"    
## [29,] "1.620365e-15" "0.5248217" "1.000" "1.000" "7.873192e-11" "1"    
## [30,] "9.685709e-15" "0.5650088" "1.000" "1.000" "4.706189e-10" "1"    
## [31,] "5.009885e-14" "0.6355427" "1.000" "1.000" "2.434253e-09" "1"    
## [32,] "7.411082e-14" "0.5410112" "1.000" "1.000" "3.600971e-09" "1"    
## [33,] "2.302990e-13" "0.5586470" "1.000" "1.000" "1.119000e-08" "1"    
## [34,] "4.219141e-13" "0.6430716" "1.000" "1.000" "2.050039e-08" "1"    
## [35,] "2.364523e-12" "0.7138238" "1.000" "1.000" "1.148898e-07" "1"    
## [36,] "3.287175e-12" "0.6687285" "1.000" "1.000" "1.597205e-07" "1"    
## [37,] "1.765477e-11" "0.6491111" "1.000" "1.000" "8.578277e-07" "1"    
## [38,] "6.759221e-10" "0.3999625" "1.000" "1.000" "3.284238e-05" "1"    
## [39,] "4.904093e-07" "0.3437786" "1.000" "1.000" "2.382850e-02" "1"    
## [40,] "1.776965e-06" "0.3578967" "1.000" "1.000" "8.634093e-02" "1"    
## [41,] "2.888748e-05" "0.3683632" "1.000" "1.000" "1.000000e+00" "1"    
## [42,] "4.515597e-05" "0.2750459" "1.000" "1.000" "1.000000e+00" "1"    
## [43,] "5.225807e-05" "0.2512317" "1.000" "1.000" "1.000000e+00" "1"    
## [44,] "6.511454e-04" "0.3040399" "1.000" "1.000" "1.000000e+00" "1"    
## [45,] "2.075924e-03" "0.4150399" "0.971" "0.837" "1.000000e+00" "1"    
## [46,] "6.941902e-03" "0.3758401" "0.618" "0.349" "1.000000e+00" "1"    
##       gene                
##  [1,] "Eef1a1"            
##  [2,] "Actb"              
##  [3,] "Cfl1"              
##  [4,] "ENSMUSG00000106106"
##  [5,] "Ubb"               
##  [6,] "Calm1"             
##  [7,] "Tmsb4x"            
##  [8,] "Ftl1"              
##  [9,] "Hspa8"             
## [10,] "ENSMUSG00000044285"
## [11,] "Gnb1"              
## [12,] "Dpysl2"            
## [13,] "Prdx1"             
## [14,] "Ubc"               
## [15,] "Tubb5"             
## [16,] "Tuba1a"            
## [17,] "COX1"              
## [18,] "ENSMUSG00000105361"
## [19,] "Aldoa"             
## [20,] "Itm2b"             
## [21,] "Psap"              
## [22,] "Fth1"              
## [23,] "Igfbp2"            
## [24,] "B2m"               
## [25,] "Ctsd"              
## [26,] "ENSMUSG00000095891"
## [27,] "ENSMUSG00000095547"
## [28,] "ENSMUSG00000096385"
## [29,] "ENSMUSG00000096736"
## [30,] "ENSMUSG00000074564"
## [31,] "ENSMUSG00000095280"
## [32,] "ENSMUSG00000096519"
## [33,] "ENSMUSG00000091028"
## [34,] "ENSMUSG00000097312"
## [35,] "ENSMUSG00000095186"
## [36,] "ENSMUSG00000075015"
## [37,] "ENSMUSG00000075014"
## [38,] "ENSMUSG00000096201"
## [39,] "ENSMUSG00000103322"
## [40,] "ENSMUSG00000112158"
## [41,] "ENSMUSG00000114367"
## [42,] "ENSMUSG00000113795"
## [43,] "Olfr46"            
## [44,] "ENSMUSG00000113161"
## [45,] "ENSMUSG00000113393"
## [46,] "ENSMUSG00000105285"

For the dataset we have clustering might not be the best option to discover interesting heterogeneity. Rather than discrete clusters we might expect continuous variations along biologically meaningful dimensions, such as neurite growth. Indeed, the main question we are interested in is whether there are certain genes that correlate in their expression with neurite growth. Testing this relationship for more than 20000 genes seperately is not sensible with only about 90 cells we have available for this analysis (due to multiple testing corrections). However, we can see if any of the first 5 principal components correlate strongly with either ‘LongestNeuriteLength’, ‘TotalNeuriteLength’ or technical covariates such as ‘TimeSincePlating’, ‘TimeInAirMem’, ‘nCount_RNA’, ‘nFeature_RNA’, ‘percent.mt’. The results of this analysis are shown below, visualized with a heatmap:

FeaturePlot(axonGrowth, features = c('TotalNeuriteLength', 'LongestNeuriteLength'))

FeaturePlot(axonGrowth, features = c('Gfap', 'Mapt'))

correlationScores = data.frame(matrix(0,12,7))
colnames(correlationScores) = c('LongestNeuriteLength', 'TotalNeuriteLength', 'TimeSincePlating', 'TimeInAirMem', 'nCount_RNA', 'nFeature_RNA', 'percent.mt')
  
for (i in 1:12){
  correlationScores[i,] = c(cor(axonGrowth@reductions$pca@cell.embeddings[,i], axonGrowth$LongestNeuriteLength, use = "complete.obs"), cor(axonGrowth@reductions$pca@cell.embeddings[,i], axonGrowth$TotalNeuriteLength, use = "complete.obs"),
          cor(axonGrowth@reductions$pca@cell.embeddings[,i], axonGrowth$TimeSincePlating, use = "complete.obs"),cor(axonGrowth@reductions$pca@cell.embeddings[,i], axonGrowth$TimeInAIRMEM, use = "complete.obs"),
          cor(axonGrowth@reductions$pca@cell.embeddings[,i], axonGrowth$nCount_RNA, use = "complete.obs"),cor(axonGrowth@reductions$pca@cell.embeddings[,i], axonGrowth$nFeature_RNA, use = "complete.obs"),
          cor(axonGrowth@reductions$pca@cell.embeddings[,i], axonGrowth$percent.mt, use = "complete.obs"))
}

temp = as.matrix(correlationScores)
rownames(temp) = paste('PCA', 1:12, sep = ' ')

Heatmap(temp, cluster_rows = FALSE)

Now principal components 1 might show some relationship to our biological variable of interest, while PC1 is dominated by technical factors. Plotting the PC1 score vs. TotalNeuriteLength or LongestNeuriteLength does not show a very convincing relationship:

for (gene in c('Gfap', 'Mapt', 'Gap43', 'Tubb3')){
col_fun <- colorRamp(c("grey", "blue"))
axonGrowth_counts = cpm(axonGrowth_counts)
norm = log2(axonGrowth_counts[,names(axonGrowth@reductions$pca@cell.embeddings[,1])][gene,] + 1)
norm = norm/max(norm)
rgb_cols <- col_fun(norm)
cols <- rgb(rgb_cols, maxColorValue = 256)

plot(axonGrowth$TotalNeuriteLength, axonGrowth@reductions$pca@cell.embeddings[,1], col = cols, pch = '.', cex = 10, ylab = 'PCA1-score', xlab = 'TotalNeuriteLength', main = gene)
plot(axonGrowth$LongestNeuriteLength, axonGrowth@reductions$pca@cell.embeddings[,1], col = cols,  pch = '.', cex = 10, ylab = 'PCA1-score', xlab = 'LongestNeuriteLength', main = gene)
}

I save the gene loadings for PC2 below:

write.csv(sort(Loadings(axonGrowth, reduction = 'pca')[,c('PC_2')], decreasing = TRUE), file = 'PC2_loadings.csv')

Principal component analysis is not optimal for identifying biologically meaningful gene modules in single cell data for two main reasons: 1.) It assumes all noise in the data is normally distributed. 2.) It assumes all gene modules are best described by ‘orthogonal’ (uncorrelated) dimensions. So in further analysis in the immediate future I will try alternative algorithms that that derive non-orthogonal modules (e.g. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-559) and model noise appropriately (e.g. https://www.embopress.org/doi/full/10.15252/msb.20188557). Such more advanced analysis strategies and possibly more data in the future, should increase our power to discover gene modules associated with neurite growth.0.